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In the challenging realm of earthquake prediction, the reliability of 
forecasting systems has remained a persistent obstacle. This study focuses 
on earthquake magnitude prediction in Indonesia, leveraging supervised 
machine learning techniques and cloud radon data. We present an analysis of 
the tele-monitoring system, data collection methods, and the application of 


regression-based machine learning algorithms. Utilizing a comprehensive 
dataset spanning 30 training instances and 105 test instances, the study 
Keywords: evaluates multiple metrics to ascertain the efficacy of the prediction models. 
Our findings reveal that the linear regression approach yields the best 


Cloud data . earthquake magnitude prediction method, with the lowest values across 
Earthquake magnitude multiple evaluation metrics: standard deviation 0.40, mean absolute error 
Machine learning (MAE) 0.30, mean absolute percentage error (MAPE) 6%, root mean square 
Radon error (RMSE) 0.52, mean squared error (MSE) 0.28, symmetric mean 
Supervised absolute percentage error (SMAPE) 0.06, and conformal normalized mean 


absolute percentage error (C(nNSMAPE) 0.97. Additionally, we discuss the 
implications of the research results and the potential applications in 
enhancing existing earthquake prediction methodologies. 
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1. INTRODUCTION 

Earthquake prediction has long been a formidable challenge, marked by the absence of a dependable 
forecasting system [1]. Various studies have attempted to anticipate seismic events through the analysis of 
diverse precursory indicators, including observations of animal behaviour, fluctuations in temperature, 
changes in radon gas emissions, and alterations in seismicity patterns [2], [3]. However, due to the 
inconsistent manifestation of these indicators preceding earthquakes, the standardization and generalization 
of these prediction methods have proven to be intricate [4]. Among these indicators, radon gas has garnered 
significant attention as a potential precursor to seismic activity [5]. Moreover, it underscores the replicable 
patterns associated with radon changes linked to seismic activity, particularly those identified in the lead-up 
to recent earthquakes [6]. While several studies have explored the use of radon gas concentration data in 
earthquake prediction, establishing an accurate forecasting system incorporating specific event details such as 
date, time, magnitude, and location has remained elusive [7]-[14]. 

The potential occurrence of an earthquake highlights the importance of precise prediction, which can 
potentially save lives and prevent damage to infrastructure. However, due to the inherent probabilistic nature 
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of earthquakes and the difficulty in establishing an effective and reliable prediction model, attempts to 
forecast earthquakes have produced inconsistent outcomes [15]. Recent advancements in technology have led 
to the application of machine learning techniques in earthquake prediction, utilizing data related to animal 
behaviour, meteorological parameters, groundwater levels, chemical dynamics, seismic patterns, and 
historical earthquake data [14], [16]-[22]. Within the broader realm of machine learning, predictive 
modelling achieves enhanced accuracy by minimizing errors within the model [23]. Despite these efforts, 
accurate short-term earthquake predictions have remained elusive, specifically concerning the magnitude and 
location of seismic events on the Eurasian and Indo-Australian Plates [24]. 

Research by Zhang et al. [25], for instance, has focused on constructing four models using the 
extreme gradient boosting method to examine the mechanisms of radon variation under both natural and 
seismic conditions. The analysis highlighted the significant impact of various factors such as spring 
discharge, water temperature, precipitation, barometric pressure, and antecedent radon on radon anomalies, 
elucidating that these anomalies are likely induced by the earthquake-driven formation of microfractures in 
rock. Notably, the presence of ten megathrust subduction zones between the Eurasian and Indo-Australian 
Plates underscores the necessity for a robust earthquake magnitude prediction algorithm based on the 
fluctuation of radon gas concentration within one to four days before seismic events of magnitude above 
M4.5 [16]. 

However, despite these advancements, the correlation between earthquakes and radon anomalies has 
not been definitively established, leading to questions about the efficacy of proposed models [26]. Notably, 
the implementation of the belief rule-based expert system (BRBES) considering data about animal behaviour, 
environmental dynamics, and chemical changes has shown promising results in predicting earthquake 
occurrences within a 12-hour timeframe [17]. Similarly, research on the seismic cycle based on historical 
data, utilising an expert system, has exhibited accurate detection of impending earthquakes within 12 hours, 
with varying magnitudes (M3.6 to M9.1) and the location is separated into one-quarter of the earth [18]. 
Research by Tehseen et al. [24], the accuracy proposed expert system for making earthquake predictions 
using an independent test set has accuracy below 70% with magnitude range from MO.1—M5.9. 

Moreover, the contemporary shift towards the integration of machine learning and deep learning 
methodologies in earthquake prediction has led to substantial advancements in the field approaches [24]. 
However, challenges persist, particularly concerning the accurate prediction of rare high-magnitude 
earthquakes and the inherent unpredictability of their timing and location [14]. This study aims to address 
these challenges by analysing an earthquake magnitude prediction algorithm that focuses on the fluctuation 
of radon gas concentration in the days leading up to seismic events of magnitude above M4.5 between the 
Eurasian and Indo-Australian Plates. Through the implementation of a supervised machine learning 
approach, this research endeavours to contribute to the existing body of knowledge on earthquake prediction 
methodologies. 


2. METHOD 

The radon gas concentration real-time telemonitoring system is measured close to an active fault in 
Yogyakarta, Indonesia, so it is vulnerable to seismic activity. The radon gas transducer is placed above 
ground level in the chamber room with a maximum distance of 4.142 cm to measure radon gas emissions 
effectively. Radon gas measurements change every 10 minutes to negate radiation emissions from Actinium 
and Thoron [27]. Figure 1 shows the earthquake prediction system design. Data from the transducer is then 
connected to the microprocessor and sent to the cloud server for real-time measurement data monitoring as 
long as you have internet access. Radon gas concentration measurement data is stored in a data storage server 
and displayed on a web server, while earthquake data comes from the Geofon Postdam and the Indonesian 
agency for meteorology, climatology, and geophysics. 

Radon cloud data and earthquake data are then used to determine the earthquake magnitude 
prediction algorithm based on the supervised machine learning method. The results of this model are then 
evaluated based on mean absolute error (MAE), mean absolute percentage error (MAPE), root mean square 
error (RMSE), mean squared error (MSE), symmetric mean absolute percentage error (SMAPE), and 
conformal normalized mean absolute percentage error ((nSMAPE). The model with the best value can then 
be used in data processing on the cloud server to be processed into an earthquake prediction notification. 
Table 1 shows the radon data set composition based on the method by Pratama [16]. Data on radon gas 
concentrations and earthquake events were then tabulated in Table 2. The data used as training data and test 
data in machine learning are radon gas concentration data when there is an earthquake day prediction which 
comes from the method used by Pratama [16], and earthquake events 1-4 days after there is an earthquake 
day prediction with magnitude above M4.5 between Eurasia and Indo-Australia Plates. The beginning of data 
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training was collected from the start from 15/9/2019 to 22/03/2020 (30 data), and then the data test started 
from 6/4/2020 until 31/12/2022 (105 data). 
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Figure 1. Earthquake prediction system design 


Table 1. Data set composition [16] 


Variable Description 
n The day when the algorithm prediction was completed based on the method of Pratama [16] 
Rn Radon average day n 
R(n-1) Radon average day n-1 
R(n-2) Radon average day n-2 
R(n-6) Radon average day n-6 
R(n-7) Radon average day n-7 
DR(1-3) Radon average 3 days before R(n-2) = average R(n-3) until R(n-5) 
DR(n-7) Radon average 7 days before R(n-2) = average R(n-3) until R(n-9) 
DR(n-14) Radon average 14 days before R(n-2) = average R(n-3) until R(n-17) 


Table 2. Example of dataset 
Earthquake date DR DR DR R R R R R R R Earthquake Distance Actual 


prediction (n-14) (n-7) (n-3) (n-7) (n-6) (n-5) (n-4) (n-3) (n-2) (n-1) date (km) magnitude 
7-Nov-22 4.34 2.95 2.85 2.45 2.65 1.63 3.17 3.76 2.08 3.83 11-Nov-22 495.70 5.0 
13-Nov-22 3.14 3.38 3.70 3.83 2.88 3.14 3.59 4.36 2.78 5.09 14-Nov-22 216.30 5.4 
13-Nov-22 3.14 3.38 3.70 3.83 2.88 3.14 3.59 4.36 2.78 5.09 16-Nov-22 1124.37 5.6 
18-Nov-22 9.62 16.32 32.81 2.78 5.09 20.33 23.05 55.04 15.19 41.97 21-Nov-22 380.36 5.6 
29-Nov-22 18.29 2.93 3.35 2.59 1.16 1.20 5.36 3.50 3.11 1.83  3-Dec-22 311.97 6.1 
4-Dec-22 8.27 345 346 3.11 1.83 427 2.60 3.51 2.47 2.00 6-Dec-22 462.60 6.2 
5-Dec-22 6.12 3.04 2.86 1.83 4.27 2.60 3.51 2.47 2.00 3.40 8-Dec-22 378.19 5.8 
9-Dec-22 2.93 2.93 3.32 2.47 2.00 3.40 4.06 2.50 2.14 3.90 13-Dec-22 584.16 5.2 
13-Dec-22 435 5.88 9.68 2.50 2.14 3.90 15.78 9.37 3.85 2.44 17-Dec-22 611.76 5.1 
14-Dec-22 4.41 5.94 9.67 2.14 3.90 15.78 9.37 3.85 2.44 2.50 18-Dec-22 903.52 5.1 
16-Dec-22 4.32 5.71 2.93 15.78 9.37 3.85 2.44 2.50 2.30 2.40 19-Dec-22 706.85 5.3 
23-Dec-22 4.40 3.09 4.26 1.79 2.37 3.23 4.32 5.24 3.07 1.89 25-Dec-22 210.73 53 


The learning process in machine learning used in this study is supervised learning using a regression 
method shown in Figure 2. The goal is for the model to learn the underlying patterns or relationships in the 
data so that it can make precision earthquake magnitude predictions on new unseen data. Machine learning 
techniques used in this study to derive earthquake magnitude prediction algorithms are linear regression, tree, 
AdaBoost, Xtreme gradient boosting, and random forest [28]-[35]. The training data will be used to build the 
earthquake magnitude prediction model. Then the test data is used to test the earthquake magnitude 
prediction model that has been designed. 

In this study, the linear regression, tree, AdaBoost, Xtreme gradient boosting, and random forest 
methods were performed using Orange Data Mining Version 3.35.0 software. Machine learning evaluation 
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methods used include MAE, MAPE, RMSE, MSE, SMAPE, and cnSMAPE. By combining the 
implementation of machine learning models with evaluations using various metrics mentioned, this study can 
provide a more comprehensive understanding of the model's performance in predicting or analyzing the 
utilized data. 


Radon Gas 
Concentration 


Set of DR(n-14) 


Figure 2. Scheme of a supervised machine learning model 


3. RESULTS AND DISCUSSION 

This study decided on earthquake magnitude prediction using a supervised machine learning 
method. Machine learning techniques used in this study to derive earthquake magnitude prediction 
algorithms are linear regression, tree, AdaBoost, Xtreme gradient boosting, and random forest. 30 training 
data were used in this supervised machine learning method and 105 test data. Setting features is done for 
each machine learning method to get the best results. The result obtained in this machine learning process is 
the prediction value of the magnitude of the earthquake that will occur based on the test data that has been 
entered. Earthquake predictions are valid for 1-4 days after the prediction based on the method used by 
Pratama et al. [7] which applies to locations between Aceh to East Nusa Tenggara, Indonesia. 

Table 3 shows the recapitulation of prediction data using a supervised machine learning method 
based on a confusion matrix and standard deviation from the difference between actual magnitude and 
predicted magnitude. A true positive condition is stated when the actual magnitude is within the prediction 
range of magnitude + Stdev error, while a false positive is when the actual magnitude is not within the 
prediction range of magnitude + Stdev error. The precision value of the machine learning method is 
calculated by [32]: 
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Precision = O (1) 
True Positive +False Positive 
The precision of earthquake prediction using the linear regression method has the highest value of 
0.82, followed by AdaBoost with 0.80. The tree has the highest precision with a true positive conditions 
value of 86. The precision of Xtreme gradient boosting is 0.71 with the falsest positive, 30 conditions. 


Table 3. Machine learning data test result 


Parameter Linear regression Tree AdaBoost Xtreme gradient boosting Random forest 
Standard deviation 0.52 0.71 0.61 0.68 0.58 
True positive 86 79 84 75 76 
False positive 19 26 21 30 29 
Precision 0.82 0.75 0.80 0.71 0.72 


Some error evaluations of machine learning methods include relative error, MAE, MAPE, RMSE, 
MSE, SMAPE, and cnSMAPE. Table 4 shows the error evaluation of the earthquake magnitude prediction 
method using machine learning. The linear regression method has the lowest standard deviation (0.40), MAE 
(0.30), MAPE (6%), RMSE (0.52), MSE (0.28), SMAPE (0.06) and cnSMAPE (0.97) values compared to 
other machine learning methods. Lower values for these metrics indicate better performance of the algorithm. 
Therefore, since linear regression has the lowest values for all these metrics, it is considered the best method 
for predicting earthquake magnitude based on the steps used in this research. Based on the prediction results 
of the earthquake using the recapitulated data set, the Tree method has the lowest evaluation result value with 
the highest standard deviation (0.48), MAE (0.50), MAPE (10%), RMSE (0.71), MSE (0.50), SMAPE (0.09) 
and the lowest cnSMAPE (0.95). With these values and compared to other machine learning methods, the 
tree method is the worst method for predicting earthquakes based on the data set determined in this study. To 
show the error characteristics, Figure 3 shows the dispersion errors using boxplot representation for each 
method. The tree method has the highest error dispersion, followed by Xtreme gradient boosting, random 
forest, AdaBoost, and linear regression which has the lowest error dispersion so that it can be stated as the 
best method in predicting earthquake magnitude using the data set. 


Table 4. Earthquake magnitude prediction error evaluation 


Error index St dev of absolute error _ MAE (s) MAPE (%) RMSE MSE  SMAPE _ cnSMAPE 
Linear regression 0.40 0.30 6% 0.52 0.28 0.06 0.97 
Tree 0.48 0.50 10% 0.71 0.50 0.09 0.95 
AdaBoost 0.46 0.40 7% 0.61 0.38 0.08 0.96 
Xtreme gradient boosting 0.46 0.50 9% 0.67 0.45 0.09 0.95 
Random forest 0.40 0.40 8% 0.58 0.34 0.08 0.96 
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Figure 3. Boxplot produced by machine learning algorithms when predicting the 105 data test 
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To analyze in detail the sign of deviation produced by the methods in predicting earthquake 
magnitude, Figure 4 shows the histogram of errors for machine learning method Xtreme gradient boosting 
(Figure 4(a)), linear regression (Figure 4(b)), AdaBoost (Figure 4(c)), random forest (Figure 4(d)), and tree 
(Figure 4(e)). The Xtreme gradient boosting, linear regression and AdaBoost methods have the highest 
frequency of values at O M error, while the random forest and tree methods are at -0.5 M and -0.25 M 
respectively. Linear regression has the highest error frequency with a quantity of 33 at 0 M followed by -0.25 
M and -0.5 M errors with a quantity of 26 and 19 states. This also indicates that the linear regression method 


is the best in predicting earthquake magnitude based on the test data in this study. 
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Figure 4. Histograms of the errors produced by (a) Xtreme gradient boosting, (b) linear regression, 


(c) AdaBoost, (d) random forest, and (e) tree algorithms when predicting the 105 data test 
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In this study, the errors for the magnitude range were also analyzed, as can be seen in Table 5, which 
shows that the linear regression method has the lowest MAE for the M5.1-M5.3 earthquake magnitude range, 
with a value of 0.08, for M4.8-M5.1 and M5.3-M5.6 being 0.28 and 0.22, respectively. The absolute error 
standard deviation of the linear regression method also has low values for most magnitude ranges with the 
lowest value being 0.08 in the M5.1-M5.3 magnitude range. The Xtreme gradient boosting method has the 
lowest absolute error standard deviation value for the magnitude range M5.7-M5.9 and over M6.5 with 
values of 0.12 and 0.43, respectively. In the actual magnitude range M5.1-M5.3, the AdaBoost method has a 
MAE of 0.22 which is lower than the MAE of the random forest method with a value of 0.24. In this 
analysis, the AdaBoost method has a MAE for magnitudes M4.8-M5.1 and M5.3-M5.6 of 0.29. Earthquakes 
with magnitudes greater than M6.2 are rare, and earthquakes cannot be engineered by humans. More data 
will make the system learn more so that it can predict earthquake magnitudes more precisely and accurately. 


Table 5. Evaluation of the absolute errors based on the actual magnitude range produced by machine learning 
algorithms when predicting the 105 data test 


Absolute error mean Absolute error standard deviation 
Actual Xtreme Linear Random Xtreme Linear Random 
magnitude gradient regress AdaBoost Tree gradient regress AdaBoost Tree 
; : forest : : forest 
range (m) _ boosting ion boosting ion 
4.5-4.7 0.58 0.62 0.62 0.54 0.60 0.24 0.11 0.11 0.25 0.19 
4.8-5 0.38 0.28 0.29 0.37 0.32 0.42 0.11 0.40 0.31 0.46 
5.1-5.3 0.34 0.08 0.22 0.24 0.46 0.36 0.08 0.25 0.22 0.45 
5.4-5.6 0.40 0.22 0.29 0.32 0.41 0.29 0.13 0.18 0.20 0.27 
5.7-5.9 0.70 0.64 0.54 0.58 0.66 0.12 0.17 0.32 0.19 0.26 
6-6.2 0.88 0.90 0.96 0.82 0.98 0.88 0.10 0.18 0.28 0.19 
6.3-6.5 1.40 1.10 1.20 1.40 1.50 - - - - - 
>6.5 1.78 1.80 1.93 1.70 1.63 0.43 0.47 0.51 0.48 0.63 


4. CONCLUSION 

The results demonstrated the effectiveness of the linear regression method in predicting earthquake 
magnitudes, with the lowest values across multiple evaluation metrics: standard deviation (0.40), MAE 
(0.30), MAPE (6%), RMSE (0.52), MSE (0.28), SMAPE (0.06) and cnSMAPE (0.97). With these results, the 
linear regression method model will be implemented in the server cloud of the earthquake early warning 
system that has been created. These findings underscore the potential of our approach to improve real-world 
disaster preparedness and mitigation efforts. The challenges remain in predicting rare high-magnitude 
earthquakes, the study provides a significant advancement in the field. Future research directions may 
involve incorporating more data to improve the precision of earthquake magnitude predictions, further 
contributing to the body of knowledge on this critical area of research. 
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